Tracking topics in broadcast news data
نویسندگان
چکیده
This paper describes a topic tracking system and its ability to cope with sparse training data for broadcast news tracking. The baseline tracker which relies on a unigram topic model. In order to compensate for the very small amount of training data for each topic, document expansion is used in estimating the initial topic model, and unsupervised model adaptation is carried out after processing each test story. A new technique of variable weight unsupervised online adaptation has been developed and was found to outperform traditional fixed weight online adaptation. Combining both document expansion and adaptation resulted in a 37% cost reduction tested on both English and machine translated Mandarin broadcast news data transcribed by an ASR system, with manual story boundaries. Another challenging condition is one in which the story boundaries are not known for the broadcast news data. A window-based automatic story boundary detector has been developed for the tracking system. The tracking results with the window-based tracking system are comparable to those obtained with a state-of-the-art automatic story segmentation on the TDT3 corpus.
منابع مشابه
Large, Multilingual, Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT-2 and TDT-3 Corpus Efforts
This paper describes the creation and content two corpora, TDT-2 and TDT-3, created for the DARPA sponsored Topic Detection and Tracking project. The research goal in the TDT program is to create the core technology of a news understanding system that can process multilingual news content categorizing individual stories according to the topic(s) they describe. The research tasks include segment...
متن کاملJoint Image-Text News Topic Detection and Tracking with And-Or Graph Representation
In this paper, we aim to develop a method for automatically detecting and tracking topics in broadcast news. We present a hierarchical And-Or graph (AOG) to jointly represent the latent structure of both texts and visuals. The AOG embeds a context sensitive grammar that can describe the hierarchical composition of news topics by semantic elements about people involved, related places and what h...
متن کاملA Cluster-based Approach to Broadcast News
We present an approach to detection and tracking of topics in multilingual broadcast news based upon a dynamic clustering scheme. Our approach derives from a system used to filter Web searches from multiple sources, with extensions for pipelining document clusters, part-of-speech tagging and extraction of named entities for use in an extended similarity measure.
متن کاملExploiting the Chronological Semantic Structure in a Large-scale Broadcast News Video Archive for its Efficient Exploration
Recent advance in digital storage technology has enabled us to archive more than 1,700 hours of video data from a daily Japanese news show in the last nine years. In this paper, to effectively make use of the video data in the archive, we first present a news video structuring method based on the chronological semantic relations between stories, namely the “topic thread structure”. Next, we int...
متن کاملProbabilistic models for topic detection and tracking
We present probabilistic models for use in detecting and tracking topics in broadcast news stories. Our information retrieval (IR) models are formally explained. The Topic Detection and Tracking (TDT) initiative is discussed. The application of probabilistic models to the topic detection and tracking tasks is developed, and enhancements are discussed. We discuss four variations of these models,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003